Back

Journal of Chemical Theory and Computation

American Chemical Society (ACS)

Preprints posted in the last 30 days, ranked by how well they match Journal of Chemical Theory and Computation's content profile, based on 126 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit.

1
SuBMIT: A Software Toolkit for Facilitating Simulations of Coarse-Grained Structure-Based Models of Biomolecules.

Prakash, D. L.; Banerjee, A.; Gosavi, S.

2026-05-20 biophysics 10.64898/2026.05.18.725912 medRxiv
Top 0.1%
22.9%
Show abstract

Coarse-grained structure-based models (CG-SBMs; or G[o] models) are simplified potential energy functions of biomolecules or biomolecular complexes that encode their structure. Molecular dynamics simulations of such SBMs have been successfully used to study long time-scale dynamics such as protein and RNA folding, and large conformational transitions of biomolecular complexes. SBMs have several advantages: (1) Their MD simulations are computationally inexpensive, making extensive sampling easily accessible to many researchers. (2) They are easy to modify and can be adapted for the specific biomolecular problem that needs to be investigated. However, the force-fields of SBMs are not usually included in commonly used biomolecular simulation packages resulting in a barrier to their use. Here, we present SuBMIT (Structure Based Models Input Toolkit; https://github.com/sglabncbs/submit), a toolkit for generating coarse-grained SBM input files for performing MD simulations with GROMACS and OpenMM/OpenSMOG. Simulations whose input files can be generated using the different flavors of CG-SBMs present in SuBMIT include the folding and conformational ensembles of proteins with intrinsically disordered regions, 3D-domain-swapping in proteins and the dynamics of RNA-protein assemblies (e.g., simple RNA viruses).

2
Reparameterization of the Amber RNA Force Field Non-Bonded Terms

Puthenpeedikakkal, A. M. K.; Cavender, C. E.; Smith, L. G.; Grossfield, A.; Mathews, D.

2026-05-19 biochemistry 10.64898/2026.05.18.725894 medRxiv
Top 0.1%
22.7%
Show abstract

All-atom simulations of RNA using molecular dynamics have the promise of modeling conformational preferences, folding thermodynamics, conformational change kinetics, and binding affinities of small molecule therapeutics. These simulations rely on a force field, a set of equations and parameters that model the potential energy as a function of conformation using classical mechanics. One popular force field for RNA is Amber OL3, with the most recent iteration derived in 1999 and with subsequent updates to backbone dihedral parameters. The Amber force field, while frequently used, is known to have limitations; for example, it does not properly stabilize native structures against alternative structures. Here, we provide a new approach to fitting the non-bonded parameters for the force field, specifically atom-centered point charges for electrostatics and the Lennard-Jones parameters. The parameters are fit to quantum mechanics (QM) interaction energies calculated with symmetry-adapted perturbation theory (SAPT), including embedded point charges to represent the electrostatic field from solvent and adjacent nucleotides. In this pilot study with a limited set of fitting data, we use the Amber ff99 equations and atom types unchanged. With the revised parameters, we observe improvement in the stability of native structures relative to alternative structures. Native tetraloop conformations, which unfold with the Amber OL3 force field, are stable on the microsecond timescale with our new force field parameters. We also see improvement in the conformational preferences of tetramers. Crucially, A-form helices are still well-modeled, but we observe additional flexibility in an internal loop that is not consistent with NMR data. Overall, we provide evidence that this new approach to fitting RNA force field parameters to SAPT interaction energies with native-structure context represented as embedded point charges is promising. It offers a flexible solution for revising the equations in future work or for extension to other molecules that interact with RNA, such as proteins and small molecules. We call this new set of force field parameters Amber RNA.ROC26.

3
CTGoMartini: A Python Framework for Simulating Biomolecular Conformational Transitions with Go-Martini Models

Yang, S.; Song, C.

2026-05-04 biophysics 10.64898/2026.04.30.721921 medRxiv
Top 0.1%
21.8%
Show abstract

Characterizing conformational transitions between distinct structural states is essential for understanding protein function but remains challenging due to the timescale limitations of atomistic molecular dynamics. While coarse-grained models like Martini accelerate sampling, classical elastic-network or G[o]-like restraints often trap proteins in a single energy basin, precluding the study of transition pathways between distinct functional states. Here, we present CTGoMartini, a comprehensive Python package designed to simulate protein conformational transitions using G[o]-Martini models in explicit membranes. CTGoMartini addresses key methodological limitations of existing approaches by redefining native contacts as a dedicated interaction type, thereby eliminating spurious protein aggregation artifacts in multi-copy simulations. The package implements both switching and multiple-basin approaches (Exponential and Hamiltonian mixing) to sample transitions between experimentally defined states. Furthermore, it integrates Hamiltonian replica exchange molecular dynamics (HREMD) with PyMBAR analysis, enabling efficient optimization of mixing parameters that govern barrier heights and relative state stabilities. We demonstrate the power of CTGoMartini through two biologically significant membrane protein systems: (1) capturing the inward-open to outward-open transition of the lipid transporter SPNS2, revealing the molecular mechanism of S1P translocation; and (2) elucidating how membrane surface tension and anionic lipids (POPA, PIP2) modulate the conformational equilibrium of the mechanosensitive ion channel TREK1. By streamlining model construction, simulation, and analysis, CTGoMartini offers an easy-to-use platform that connects static structural snapshots with their underlying dynamic functional mechanisms. TOC Graphic O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=118 SRC="FIGDIR/small/721921v1_ufig1.gif" ALT="Figure 1"> View larger version (26K): org.highwire.dtl.DTLVardef@75eb26org.highwire.dtl.DTLVardef@1a12accorg.highwire.dtl.DTLVardef@e927org.highwire.dtl.DTLVardef@1cb0dcd_HPS_FORMAT_FIGEXP M_FIG C_FIG

4
Benchmarking generative AI and physics based molecular simulation for sampling conformational heterogeneity in T4 Lysozyme

Bhakat, S.

2026-05-13 biophysics 10.64898/2026.05.10.724101 medRxiv
Top 0.1%
15.4%
Show abstract

Wild-type T4 lysozyme (T4L) is used as a benchmark to evaluate conformational sampling across generative AI, AI-accelerated molecular simulation (AMS), and physics-based enhanced molecular dynamics (EMD). A four-state model: exposed/open, exposed/closed, buried/open, and buried/closed; is defined using physically meaningful collective variables. While generative AI methods (AF-cluster, MSA subsampling of AlphaFold2, ConforFold, AlphaFlow, ESMFlow, ConfRover, BioEmu) largely sample only the exposed/open state, AMS integrating generative ensembles with iterative molecular dynamics, recovering all states and reproducing equilibrium populations similar to EMD and experimental smFRET signatures.

5
Deep Learning Structural Ensembles as Proxies for Protein Flexibility

Tunc, M. T.; Dizkirici Tekpinar, A.; Tekpinar, M.

2026-05-18 bioinformatics 10.64898/2026.05.16.725658 medRxiv
Top 0.1%
11.9%
Show abstract

Protein dynamics are essential to biological function, yet understanding whether deep learning models contain information about these dynamics remains an open question. In this study, we quantitatively investigate the capacity of deep learning structure generation methods to predict protein flexibilities by directly comparing residue-level mean squared fluctuation (MSF) profiles derived from structural ensembles with experimental or simulation-informed flexibility profiles. We assembled four diverse benchmark datasets representing different types of structural information, including 70 NMR ensembles, 43 X-ray crystallographic protein pairs in two distinct conformational states, 82 high-resolution cryo-EM structures, and molecular dynamics simulations of 10 proteins. Utilizing AlphaFold3, AlphaFold2, and RosettaFold to generate multiple structural models, we applied ranksort normalization to place the profiles on a comparable scale and quantified similarity primarily using cosine and Pearson similarities. Our results demonstrate that the flexibility predictions from deep learning-generated models agree well with experimental data, suggesting that fluctuations in these predicted ensembles can serve as effective proxies for protein flexibility. Notably, AlphaFold3 consistently produced the best results across the datasets. We also observed that flexibility prediction accuracy generally improves as the number of models increases up to 15, and our findings remain robust even when terminal residues are excluded from the analysis. To facilitate broader application, we provide three publicly accessible Jupyter Notebooks to calculate MSF from deep learning outputs. Ultimately, this work provides evidence that deep learning structural ensembles can serve as proxies for protein flexibility.

6
Environment-conditioned design of alpha-helical peptides

Conde-Torres, D.; Garcia-Fandino, R.; Pineiro, A.

2026-05-08 biophysics 10.64898/2026.05.07.723485 medRxiv
Top 0.1%
9.9%
Show abstract

Designing peptide sequences that remain stable and selective across heterogeneous environments remains a central challenge in biomolecular modeling. Here we introduce an interpretable, physics-based Hamiltonian for environment-conditioned design of -helical peptide sequences. The model integrates helix propensities, pairwise interactions, electrostatics, anisotropic solvent exposure, and interfacial geometry into a unified energy function. To enable comparison across sequence lengths and environments, all contributions are rescaled and expressed as Z-scores relative to random sequence ensembles, yielding a normalized design landscape with balanced physical terms. This formulation defines a structured optimization problem that can be explored using exact, heuristic, and hybrid quantum- classical approaches without modification of the underlying model. The Hamiltonian recovers polar and apolar limits, discriminates experimentally characterized water-soluble and transmembrane -helical peptide sequences, and captures the preferential stabilization of membrane-active sequences at anionic interfaces over non-functional controls. It further enables multi-objective and selective design, generating candidate sequences with tunable environmental specificity.

7
Temporal Hydrogen-Bond Network Analysis Reveals Substrate-Directed Connectivity in Dihydrofolate Reductase

Guclu, T. F.; ATILGAN, C.; Atilgan, A. R.

2026-05-07 biophysics 10.64898/2026.05.05.722848 medRxiv
Top 0.1%
9.8%
Show abstract

Hydrogen-bond networks are central to protein function, but most network analyses rely on static representations that neglect how interactions evolve in time. Here, we introduce a framework that combines instantaneous and temporal graph analysis of hydrogen-bond networks derived from molecular dynamics trajectories to quantify ligand-directed hydrogen-bond connectivity. We apply the method to E. coli dihydrofolate reductase (DHFR) and its L28R mutant, computing shortest hydrogen-bond paths from all residues to the substrate dihydrofolate (DHF). The instantaneous analysis reveals that DHF-directed connectivity is organized through a sparse set of preferred routes, with D27 and T113 acting as prominent hubs in the wild-type enzyme. Temporal analysis highlights residues that preferentially support time-ordered DHF-directed connectivity. Comparison with L28R shows that the mutation preserves the main substrate-contacting architecture and the overall communication scaffold but redistributes pathway usage, persistence, and temporal convergence. The network-derived hotspots partially overlap with independent coevolution signals, most strongly in the K109-I115 region, while overlap with cryptic-site predictors is more limited. This pattern indicates that the hydrogen-bond network captures evolutionarily supported communication regions in DHFR that are not fully recovered by static structural approaches. The framework is broadly applicable to ligand-binding proteins and provides a route to identify persistent, delayed, and mutation-sensitive signaling pathways directly from time-ordered simulation data.

8
pH Induced Changes in Protein Structure and Hydration

Sen, A.; Chakrabarti, J.; Mitra, R. K.

2026-05-14 biophysics 10.64898/2026.05.13.724817 medRxiv
Top 0.1%
8.3%
Show abstract

The molten globule (MG) state is an intermediate in the unfolding pathway of proteins, typically triggered by denaturing agents such as urea, extreme pH, high pressure, or heat. The microscopic details of such states are far from understood. Here we study the MG states in protein Hen Egg-White Lysozyme (PDB ID: 1AKI) using microscopic constant pH molecular dynamics (CpHMD) simulations and experiments across a wide pH range. We observe that the titratable residues act as key drivers of conformational fluctuations, promoting the emergence of MG states at extreme pH. These states display partial unfolding, and small global structural changes (< 7% deviation). Hydration around the fluctuating acidic residues shows reduced water density and weakened hydrogen bonding at low pH. At high pH, hydration around acidic residues increases relative to pH = 7, whereas hydration around basic residues decreases. The translational and rotational dynamics of hydration water also exhibit pronounced pH dependence: the translational diffusion coefficient (Dtrans) increases linearly with decrease in pH in acidic medium and increases linearly with increasing pH in the basic regime. The rotational diffusion (Drot) shows similar dependencies on pH except a break at pH {approx} 4 corresponding to acidic residue pKa values. Our results may be useful to identify ligand binding of lysozyme in extreme pH conditions.

9
Coupled Binding and Folding of NS2B/NS3 Protease and Linker Effects Revealed by Topology-based Modeling

Dong, K.; Huang, J.; Chen, M.; Chen, J.

2026-05-07 biophysics 10.64898/2026.05.04.722635 medRxiv
Top 0.2%
7.1%
Show abstract

Orthoflavivirus, such as West Nile Virus (WNV), dengue virus (DENV) and ZIKA virus (ZIKV), are globally distributed pathogens that pose substantial threats to human health. Currently, there are still no effective antiviral drugs for WNV or ZIKV. Despite the availability of two licensed DENV vaccines, their use remains limited due to potential risks, highlighting an urgent need for antiviral drug development. The highly conserved orthoflavivirus protease NS2B/NS3 is required for viral replication, making it a promising anti-flavivirus target. A major challenge, however, is the highly charged active site of this enzyme, which requires charged chemical matters with low bioavailability. An alternative and more attractive strategy is to target potential allosteric sites or folding intermediate states of the protease. In this work, we employ the topology-based coarse-grained G[o] modeling to explore the coupled binding and folding pathways of WNV NS2B/NS3 protease and study the effects of the widely used experimental construct with a G4SG4 linker between NS2B and NS3 on stability and folding. Our results provide a holistic conformational landscape of the protease binding and folding, including several key intermediate states. We find that the presence of the G4SG4 linker alters the folding pathways and destabilizes the NS2B C-terminus. The latter is consistent with experimental observations that the G4SG4 linked protease has lower activity and adopts an open state without the substrate in crystal structures. Together, these findings provide for the first time a complete picture of the binding and folding of the NS2B/NS3 protease and identify important folding intermediate states that could be targeted for allosteric antiviral drug development. TOC Figure O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=157 SRC="FIGDIR/small/722635v1_ufig1.gif" ALT="Figure 1"> View larger version (40K): org.highwire.dtl.DTLVardef@163c356org.highwire.dtl.DTLVardef@ad7b35org.highwire.dtl.DTLVardef@173ed8aorg.highwire.dtl.DTLVardef@1f026bf_HPS_FORMAT_FIGEXP M_FIG C_FIG

10
Bayesian-Steered Structure Prediction of Mechanical Biomolecules Using Twisted Diffusion

Klaus, C.; Sotomayor, M.

2026-05-13 bioinformatics 10.64898/2026.05.11.724187 medRxiv
Top 0.2%
6.2%
Show abstract

Deep learning approaches have revolutionized protein structure prediction. These tools are trained using experimental data and recapitulate reported conformations, but there is great interest in predicting conformations that may be functionally relevant although experimentally underrepresented. Since many modern structure prediction tools use generative artificial intelligence diffusion models, we reframe the search for alternative molecular conformations as that of sampling from a diffusion distribution conditioned using any arbitrary Bayesian likelihood. We implement a twisted diffusion sampler in Boltz-2 to sample this conditioned distribution and demonstrate the utility of this approach, which does not require any additional training of the neural network, by implementing a diffusion analog of steered molecular dynamics simulations applied to mechanical systems. We can reproduce predicted stretched states of fragments of DNA, the muscle protein titin, and the inner-ear protocadherin-15 protein, as well as open states of the MscL ion channel consistent with experimental results. We expect that steered structure predictions will help sample underrepresented and non-equilibrium conformations for many macromolecular systems.

11
Simple baselines rival protein language models in mutation-dense design tasks

Talpir, I.; Fleishman, S. J.

2026-05-06 bioinformatics 10.64898/2026.05.01.722313 medRxiv
Top 0.2%
6.2%
Show abstract

Computational protein design demands generally applicable models that reliably predict or generate unmeasured variants with superior functional properties. Although protein language models (pLMs) have been used in zero-shot and transfer-learning design studies, they have generally not been assessed in benchmarks that explicitly test combinatorial extrapolation from lower- to higher-order variants. Here we benchmark widely used pLMs against conventional baseline methods in recently described dense, experimentally validated multi-mutant landscapes. We find that regardless of architecture and parameter count, pLMs are statistically similar to one another, and none consistently outperforms conventional baseline methods. Furthermore, their ability to distinguish functional from non-functional variants in zero-shot prediction is comparable to that of conventional homology-based methods. We suggest that to contribute significantly to the design of protein function, pLMs may need to encode biophysical and structural priors or be combined with structure-based approaches.

12
Does the sequence of a disordered protein encode small molecule binding paths?

Louet, A. A. B.; Hummer, G.; Vendruscolo, M.

2026-05-23 biophysics 10.64898/2026.05.20.726646 medRxiv
Top 0.2%
4.9%
Show abstract

Ligand binding to intrinsically disordered proteins resists description in terms of conventional binding pockets, yet it can be analysed as a dynamic process in which ligands move across transient surface interaction sites. Here we characterise a pathway-based representation in which ligand binding is described as a sequence of transitions between residue-defined microstates, enabling ligand-specific effects to be distinguished from intrinsic properties of the peptide conformational ensemble. Using all-atom molecular dynamics simulations of A{beta}42 and the C-terminal region of -synuclein in complex with chemically diverse small molecules, we construct transition matrices that encode ligand movement across the peptide surface and use Markov state models to identify dominant binding pathways and relative binding propensities. Pairwise enrichment-factor and AUC analyses reveal strong conservation of the highest-ranked pathways across chemically diverse ligands, with enrichment factors of 15-45 for the top-ranked states and AUC values typically [&ge;]0.75, well above random expectation. These dominant pathways are also preserved across changes in pH and temperature, whereas a urea control, included as a non-specific binder, shows reduced enrichment, indicating that ligands primarily modulate pathway weights rather than define the underlying network topology. Ensemble docking across chemically diverse libraries further supports the presence of recurrent ligand-accessible hotspots within the peptide conformational ensemble. Building on this framework, we apply a prospective screening pipeline to A{beta}42, combining MSM-derived hotspots with sequence-based Ligand-Transformer scoring and Gnina docking across 1.66 million compounds, to nominate 19 candidates for prospective experimental evaluation. Together, these results indicate that disordered protein sequences give rise to conformational ensembles that exhibit characteristic binding pathways for small molecules.

13
Temperature-Dependent Rotamer Population Shifts Govern Tryptophan Fluorescence in Proteins

Hsu, I.-S.; Chou, Y.-C.; Lee, Y.-T.; Wang, W.-H.; Tsai, M.-Y.

2026-05-23 biochemistry 10.64898/2026.05.22.726722 medRxiv
Top 0.2%
4.8%
Show abstract

Intrinsic tryptophan fluorescence is widely used as a sensitive reporter of protein conformational dynamics, yet the molecular origin of its temperature-dependent modulation remains unclear. Here we investigate the conformational dynamics of Trp134 in bovine serum albumin (BSA) using molecular dynamics (MD) simulations, free-energy calculations based on umbrella sampling and WHAM, quantum mechanical (QM) calculations, and QM/MM approaches. MD simulations show that the global structure of BSA remains stable while temperature induces a gradual population shift from the Ia+ to the Ia- rotamer. The corresponding free-energy landscapes reveal that this shift arises from subtle changes in basin stability and transition barriers along the rotameric coordinate. In contrast, standalone QM calculations on isolated tryptophan predict different energetic trends, highlighting the sensitivity of rotamer stability to electronic-structure treatments and environmental effects. QM/MM calculations partially reconcile these differences by incorporating the protein environment. Together, these results suggest that temperature reshapes the rotamer free-energy landscape of Trp134, leading to population shifts that modulate intrinsic tryptophan fluorescence in proteins.

14
Amino Acid Insertion Energetics in a POPC Bilayer from Unbiased Molecular Dynamics

Bories, S. C. A.; Lague, P.

2026-05-12 bioinformatics 10.64898/2026.05.07.723583 medRxiv
Top 0.2%
4.3%
Show abstract

Membrane association is governed by the thermodynamics of amino acid partitioning between water and the lipid bilayer. Here, we quantified amino acid side-chain insertion energetics in a 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) bilayer using unbiased molecular dynamics simulations. Equilibrium depth distributions of 28 analogs, including multiple protonation states, were converted into potentials of mean force (PMFs) by Boltzmann inversion. The resulting PMFs reproduced the main features of bilayer partitioning. Hydrophobic analogs favored the bilayer core, aromatic analogs were stabilized in interfacial regions, and polar or charged analogs remained unfavorable in the hydrophobic interior. A diglycine analog representing the peptide backbone behaved similarly to uncharged polar residues. Depth-dependent pKa profiles and orientational analyses further showed how protonation equilibria and aromatic-ring alignment influence insertion energetics. Agreement with experimental hydrophobicity scales supports the robustness of the approach. These results provide an efficient and internally consistent framework for characterizing bilayer insertion energetics and establish a reference for future studies in more complex lipid environments. O_FIG O_LINKSMALLFIG WIDTH=198 HEIGHT=200 SRC="FIGDIR/small/723583v1_ufig1.gif" ALT="Figure 1"> View larger version (79K): org.highwire.dtl.DTLVardef@127b12org.highwire.dtl.DTLVardef@14de924org.highwire.dtl.DTLVardef@53b27org.highwire.dtl.DTLVardef@16e8ee4_HPS_FORMAT_FIGEXP M_FIG C_FIG SIGNIFICANCEMembrane-associated proteins represent a large fraction of the proteome and include many major drug targets, yet quantitative understanding of their interactions with lipid bilayers remains limited. Here, we present an unbiased molecular dynamics framework for systematically determining amino acid side-chain insertion free energies in a model bilayer. By deriving potentials of mean force directly from equilibrium depth distributions, this approach enables internally consistent comparisons across residue classes and protonation states without biasing restraints. The resulting free-energy profiles reproduce established hydrophobicity trends and show how protonation equilibria and aromatic-ring orientation modulate bilayer partitioning. This scalable strategy provides a quantitative reference for residue-level membrane thermodynamics and establishes a foundation for extending insertion energetics to more diverse lipid compositions and more complex membrane-associated systems.

15
INTEGRATOR: Structural Elucidation of the INO80 Chromatin Remodeler via Experimentally Guided Molecular Simulations

Nde, J.; Panapitiya, G.; Cheung, M. S.; Maupin, C. M.; Sardiu, M. E.

2026-05-17 biophysics 10.64898/2026.05.15.725493 medRxiv
Top 0.3%
3.6%
Show abstract

The INO80 chromatin remodeling complex plays a central role in DNA repair, transcription, and replication. Yet, a comprehensive understanding of its structural organization remains incomplete due to the dynamic nature of several of its subunits and the sharing of several subunits with related remodeling complexes. Here, we report a computational model of the three-dimensional structure of the S. cerevisiae INO80 complex using an integrative approach that combines experimental crosslinking mass spectrometry, molecular docking, and molecular dynamics simulations. Our results reveal the spatial and dynamical organization of key modules--ARP8, ARP5, NHP10, and RVB1/2--within the intact complex. The resulting structural model agrees with crosslinking constraints, highlighting the architecture of the previously uncharacterized NHP10 module. This module, including the C-terminal region of the Ino80 scaffolding protein, has remained elusive due to its intrinsic flexibility and lack of high-resolution structural data. To facilitate this integrative modeling workflow and make it broadly accessible, we presented INTEGRATOR: (INTEGRAtive TempOral and stRuctural Analysis of protein modules), a versatile workflow package designed as a tool to elucidate the structure and dynamics of large, flexible macromolecular assemblies using well-established softwares. Our findings demonstrate the power of integrative modeling in resolving the role of the highly disordered NPH10 module in recruiting other dynamic modules into INO80 large protein assemblies and offer a generalizable framework for determining the architecture of similarly complex and heterogeneous molecular machines. This work carries broad implications for understanding the structural basis of chromatin regulation in microbial organisms and the implications for the dysregulation in diseases such as cancer.

16
Pathway Representation via Intrinsic Structural Medoids (PRISM): A Structural Mapping Approach to Clustering Molecular Pathways

Brylle Woody Santos, J.; Leung, J.; Chong, L.; Miranda Quintana, R. A.

2026-05-19 biophysics 10.64898/2026.05.16.725628 medRxiv
Top 0.3%
3.6%
Show abstract

We present Pathway Representation via Intrinsic Structural Medoids (PRISM), a state-aware framework for clustering pathways from molecular dynamics simulations of biomolecular transitions. In PRISM, each pathway is mapped to a small set of structural medoids obtained via a deterministic k-means clustering scheme. Pairwise pathway dissimilarities are computed using a weighted average Hausdorff distance between these representative sets, effectively capturing mean nearest-neighbor structural deviations while reducing sensitivity to outliers. Hierarchical agglomerative clustering of the resulting dissimilarity matrix defines pathway families. We evaluate PRISM across three biomolecular transitions of increasing complexity: alanine dipeptide C7eq [-&gt;] C7ax isomerization, adenylate kinase opening, and HIF-2 PAS-B ligand unbinding. PRISM consistently yields robust cluster assignments, with medoids faithfully representing distinct conformational states. By combining a state-based description with robust geometric dissimilarities, PRISM provides a scalable framework for organizing complex transition pathways.

17
Drug design using unique conformations to preferentially target a specific site on collagen-bound MMP1

SARKAR, S. K.; Nash, A.; Harms, C.

2026-05-17 biophysics 10.64898/2026.05.14.725194 medRxiv
Top 0.3%
3.5%
Show abstract

Precise site-specific drug design remains a challenge in structure-based drug discovery. Most existing approaches screen for ligands to target binding pockets on a protein surface based on static structures obtained from techniques such as X-ray, NMR, cryo-EM, and AlphaFold. However, the structure-function paradigm is, in reality, a structure-dynamics-function relationship that determines a proteins binding and activity. As such, drug screening or design without evaluating binding competition across the protein surface or considering the receptors dynamic, substrate-dependent conformational states is incomplete. Substrate-specific unique protein conformations are underexplored and offer novel opportunities for selective therapeutic targeting, though systematic workflows for identifying and exploiting such sites remain limited. Previously, we showed that collagen alters matrix metalloprotease-1 (MMP1) dynamics and that R405 is an allosteric residue on the MMP1 surface that exhibits strong dynamic correlations with its active site. Here, we present a substrate-specific allosteric drug-design framework that targets specific sites on a protein, using collagen-bound MMP1 as a model system. We determined the conformational dynamics of free and collagen-bound MMP1 using all-atom molecular dynamics (MD) simulations and categorized conformations into clusters of similar conformations. We then compared and identified unique conformations that occur only in collagen-bound MMP1 to design drugs against them using a machine-learning approach. The top three unique clusters were used to generate approximately 150,000 candidate compounds that were then screened against both the R405-centered region and all detectable binding pockets across the MMP1 surface. We have found several compounds that bind preferentially around R405 by at least 0.3 kcal/mol relative to competing sites across the surface. This strategy establishes a generalizable framework for designing ligands that preferentially target substrate-specific allosteric sites, providing new opportunities for precision therapeutics that modulate proteins in their biologically relevant functional states. Simple SummaryIn this paper, we establish a substrate-specific allosteric drug-design strategy that integrates all-atom molecular dynamics simulations, conformational clustering, machine-learning-based ligand design, and surface-wide binding-selectivity screening, using collagen-bound MMP1 as a model system. We show that collagen binding reshapes the conformational ensemble of MMP1, creating unique conformational states that are absent or inaccessible in the free enzyme. By identifying these substrate-specific conformations, generating ligands based on the corresponding dynamic fingerprints around the collagen-specific allosteric residue R405, and screening compounds across all binding pockets on the MMP1 surface, we demonstrate preferential targeting of the collagen-specific site relative to competing pockets. These results establish a generalizable framework for designing ligands that selectively recognize biologically relevant substrate-bound conformations rather than static protein structures alone. Substrate-specific allosteric targeting may enable selective modulation of individual protein functions while minimizing off-target interactions, providing new opportunities for precision therapeutics against dynamic protein systems.

18
Fast and Ultra-Capable Protein Design: Advancing the Frontier Through Atomistic SE(3)-Equivariance with Genie 3

Lin, Y.; Lee, M.; Vermani, A.; Jiang, E.; De Cooman, S.; Spetko, M.; AlQuraishi, M.

2026-05-05 bioinformatics 10.64898/2026.05.01.722168 medRxiv
Top 0.3%
3.5%
Show abstract

Despite the breakneck pace of progress in protein design methodology, frontier problems remain challenging, with leading methods struggling to design high-affinity binders, scaffold multiple functional motifs, or stabilize large multi-domain proteins. Recent research efforts have focused on two areas: improving model reasoning when generating active sites or binding interfaces, and improving concordance between the design process and the in silico oracle used to select promising designs. In addressing the first, the field has shifted towards all-atom models that capture sidechain conformations in atomistic detail by eschewing data-efficient SE(3)-equivariance, mirroring the evolution of AlphaFold2 to AlphaFold3. In addressing the second, recent work has focused on replacing generative models employing diffusion or flow-matching with hallucination approaches that directly optimize the oracle in sequence space; this improves success rates but reduces computational efficiency. Here, we close and surpass the generation-hallucination gap by revisiting SE(3)-equivariance using a branched polymer treatment of protein structures. The resulting diffusion model, Genie 3, achieves state-of-the-art performance on binder design, motif scaffolding, and unconditional generation, while being significantly faster than the best existing methods. We use Genie 3 to design a nanomolar binder of Nipah Glycoprotein G, a tetramer with minimal structural or biophysical characterization, as part of the Adaptyv Bio Nipah Competition, achieving a 12.5% success rate. Taken together, our results present a new frontier in protein design capability and a reexamination of the role of SE(3)-equivariance in molecular modeling.

19
Structural bias in machine learning-guided peptide design

Aldas-Bulos, V. D.; Plisson, F.

2026-05-08 bioinformatics 10.64898/2026.05.06.721805 medRxiv
Top 0.4%
2.8%
Show abstract

Machine learning continues to accelerate peptide and protein design through the rapid prediction and generation of sequences with desired characteristics. Many applications focus on predicting properties, functions, and structures, as well as generating point mutations and de novo designs. Nevertheless, many models prove less generalizable than initially claimed. Most predictors and generators are trained on sequential datasets, where imbalances can be addressed during preprocessing. In contrast, structural bias, a subtype of algorithmic bias arising from uneven representation of structural classes in training datasets, and the limitations of early protein structure predictors have frequently remained undetected and uncorrected. The recent surge in powerful protein structure prediction tools, such as the AlphaFold and RosettaFold series and their variants, now presents opportunities to mitigate this issue. We hypothesize that such structural sampling biases influence the downstream performance of ML models. Using antimicrobial peptides as a case study, we audited the structural biases in 16 state-of-the-art predictors for antimicrobial activity and tested whether structural information constrains their predictions. Our analysis revealed that models explicitly trained on sequential data still produce predictions biased by uneven fold representations and data leakage. These findings highlight the importance of integrating balanced structural data or implementing bias-mitigating strategies to develop agnostic models that maximize bioactive protein discovery and multi-objective optimization.

20
Structure and Dynamics of the HIV-1 Envelope Protein on the Virion Envelope

Majumder, A.; Dutta, M.; Cherek, L.; Voth, G. A.

2026-05-18 biophysics 10.64898/2026.05.18.725998 medRxiv
Top 0.4%
2.6%
Show abstract

HIV-1 buds from infected cells as immature virion particles with a scattered envelope glycoprotein (Env) distribution on their envelope. It then undergoes maturation, during which the viral protease cleaves the Gag polyprotein at multiple sites, leading to structural reorganization of the viral particle and lateral redistribution of Env proteins, ultimately rendering the virion infectious. However, the underlying mechanism of maturation-induced Env reorganization remains elusive. In this study, we combine microsecond-long all-atom (AA), bottom-up coarse-grained (CG) molecular dynamics simulations, and diffusion model-based backmapping to investigate the structural organization and key interactions of Env in viral membranes. AA simulations of fully glycosylated Env embedded in HIV-1 mimetic asymmetric bilayers were first performed to characterize its conformational dynamics and Env-lipid interactions. We then developed a bottom-up CG model of glycosylated Env from that AA data and simulated the mature HIV-1 virion envelope containing multiple Env proteins. The CG simulations predict that Env proteins form clusters through interactions mediated by the cytoplasmic tail domain (CTD) and adopt diverse tilted conformations within these clusters. These CG simulations were then backmapped to AA resolution and further AA simulations were carried out to identify, in detail, the specific interacting residues in the Env clusters. Additionally, analysis of epitope accessibility shows that broadly neutralizing antibodies (bnAbs) targeting the V1/V2 and V3 loops may efficiently interact with Env clusters on the mature virion surface. Together, these results provide a molecular mechanism for Env oligomerization during viral maturation and offer new insights into the accessibility of bnAb epitopes on Env clusters.